Indexing Boolean Expressions

نویسندگان

  • Steven Euijong Whang
  • Chad Brower
  • Jayavel Shanmugasundaram
  • Sergei Vassilvitskii
  • Erik Vee
  • Ramana Yerneni
  • Hector Garcia-Molina
چکیده

We consider the problem of efficiently indexing Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF) Boolean expressions over a high-dimensional multi-valued attribute space. The goal is to rapidly find the set of Boolean expressions that evaluate to true for a given assignment of values to attributes. A solution to this problem has applications in online advertising (where a Boolean expression represents an advertiser’s user targeting requirements, and an assignment of values to attributes represents the characteristics of a user visiting an online page) and in general any publish/subscribe system (where a Boolean expression represents a subscription, and an assignment of values to attributes represents an event). All existing solutions that we are aware of can only index a specialized sub-set of conjunctive and/or disjunctive expressions, and cannot efficiently handle general DNF and CNF expressions (including NOTs) over multi-valued attributes. In this paper, we present a novel solution based on the inverted list data structure that enables us to index arbitrarily complex DNF and CNF Boolean expressions over multi-valued attributes. An interesting aspect of our solution is that, by virtue of leveraging inverted lists traditionally used for ranked information retrieval, we can efficiently return the top-N matching Boolean expressions. This capability enables emerging applications such as ranked publish/subscribe systems [16], where only the top subscriptions that match an event are desired. For example, in online advertising there is a limit on the number of advertisements that can be shown on a given page and only the “best” advertisements can be displayed. We have evaluated our proposed technique based on data from an online advertising application, and the results show a dramatic performance improvement over prior techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Analysis and Optimization for Boolean Expression Indexing

BE-Tree is a novel dynamic tree data structure designed to efficiently index Boolean expressions over a high-dimensional discrete space. BE-Tree copes with both high-dimensionality and expressiveness of Boolean expressions by introducing a twophase space-cutting technique that specifically utilizes the discrete and finite domain properties of the space. Furthermore, BE-Tree employs self-adjustm...

متن کامل

Basic Goals: Separation of Concerns Generate efficient code sequences for individual operations Keep it fast and simple: leave most optimizations to later phases Provide clean, easy-to-optimize code IR forms the basis for code optimization and target code generation

Assumptions Intermediate language: RISC-like 3-address code‡ Intermediate Code Generation (ICG) is independent of target ISA Storage layout has been pre-determined Infinite number of registers + Frame Pointer (FP) Q. What values can live in registers? ‡ ILOC: Cooper and Torczon, Appendix A. Strategy 1. Simple bottom-up tree-walk on AST 2. Translation uses only local info: current AST node + chi...

متن کامل

Semantic Indexing Using WordNet Senses

We describe in this paper a boolean Information l~.etrieval system that adds word semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined wordbased and sense-based approach. The key to our system is a methodology for building semantic representations of open text, at word and collocation level. This ne...

متن کامل

BLOSOM: A Framework for Mining Boolean Expressions

We introduce a novel framework, called BLOSOM, for mining (frequent) boolean expressions over binary-valued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. We focus on mining the simplest expressions (theminimal generators) for each class. We also propose a closure op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009